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Abstract. We study the relative value iteration for the ergodic control problem under a near- 
monotone running cost structure for a nondegenerate diffusion controlled through its drift. This 
algorithm takes the form of a quasilinear parabolic Cauchy initial value problem in R d . We show 
that this Cauchy problem stabilizes, or in other words, that the solution of the quasilinear parabolic 
equation converges for every bounded initial condition in C 2 (R d ) to the solution of the Hamilton- 
Jacobi-Bellman (HJB) equation associated with the ergodic control problem. 
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1. Introduction. This paper is concerned with the time-asymptotic behavior 
of an optimal control problem for a nondegenerate diffusion controlled through its 
drift and described by an Ito stochastic differential equation (SDE) in R d having the 
following form: 

(1.1) dX t = b(X t ,U t )dt + cr(X t )dWt. 

Here Ut is the control variable that takes values in some compact metric space. We 
impose standard assumptions on the data to guarantee the existence and uniqueness 
of solutions to (jl.lj) . These are described in §3.11 Let r : M. d x U — > M. be a contin- 
uous function bounded from below, which without loss of generality we assume it is 
nonnegative, referred to as the running cost. As is well known, the ergodic control 
problem, in its almost sure (or pathwise) formulation, seeks to a.s. minimize over all 
admissible controls U the functional 

1 f* 

(1.2) limsup - / r(X s ,U s )ds. 

t^OO t Jq 

A weaker, average formulation seeks to minimize 

1 /"* 

(1.3) limsup - / E u [r(X s ,U s )]ds. 
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Here E denotes the expectation operator associated with the probability measure on 
the canonical space of the process under the control U. We let g be defined as 

1 f* 

(1.4) q = inf limsup - / E u [r(X s ,U s )] ds , 

i.e., the infimum of (|1.3|) over all admissible controls (for the definition of admissible 
controls see H3.ll) . Under suitable hypotheses solutions to the ergodic control problem 
can be synthesized via the Hamilton-Jacobi-Bellman (HJB) equation 

(1.5) a ij (x)dijV + H(x, VV) = g, 
where a = [a^] is the symmetric matrix ^crcr T and 

H(x,p) = min {b(x, u) ■ p + r(x, u)} . 

u 

The desired characterization is that a stationary Markov control v is optimal for the 
ergodic control problem if and only if it satisfies 

(1.6) H(x, W{x)) = b(x, v(x)) ■ VV(x) + 

a.e. in R rf . Obtaining solutions to (|1.5[) is further complicated by the fact that g is 
unknown. For controlled Markov chains the relative value iteration originating in the 
work of White 20J provides an algorithm for solving the ergodic dynamic programming 
equation for the finite state finite action case. Moreover its ramifications have given 
rise to popular learning algorithms (Q-learning) [1]. 

In [3] we introduced a continuous time, continuous state space analog of White's 
relative value iteration (RVI) given by the quasilinear parabolic evolution equation 

(1.7) d t <p(t,x)=a iS (x)8 ij tp(t,x)+H(x,Vtp)-(p(t,0), ip(Q, x) = tpo(x) . 

Under a uniform (geometric) ergodicity condition that ensures the well-posedness of 
the associated HJB equation we showed in [3] that the solution of (|1.7p converges as 
t — > oo to a solution of (|1.5p . the limit being independent of the initial condition t^o- 
In a related work we extended these results to zero-sum stochastic differential games 
and controlled diffusions under the risk sensitive criterion [5]. 

Even though the work in [3] was probably the first such study of convergence of a 
relative iteration scheme for continuous time and space Markov processes, the blanket 
stability hypothesis imposed weakens these results. Models of controlled diffusions 
enjoying a uniform geometric ergodicity do not arise often in applications. Rather, 
what we frequently encounter is a running cost which has a structure which penalizes 
unstable behavior and thus renders all stationary optimal controls stable. Such is 
the case for quadratic costs typically used in linear control models. A fairly general 
class of running costs of this type, which includes 'norm-like' costs, consists of costs 
satisfying the near-monotone condition: 

(1.8) {x S R d : min r(x, u) < g} is a compact set. 
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In this paper we relax the blanket geometric ergodicity assumption and study the 
relative value iteration in (|1.7[) under the near-monotone hypothesis (|1.8p . It is well 
known that for near-monotone costs the HJB equation (jl.5l) possesses a unique up to 
a constant solution V which is bounded below in M d [4]. However, this uniqueness 
result is restricted. In general, for /3 > g the equation 

(1.9) a ij (x)dijV + H(x, W) = /3 

can have a multitude of solutions which are bounded below 4 . As a result, the policy 
iteration algorithm (PIA) may fail to converge to the optimal value [21117]. In order 
to guarantee convergence of the PIA to an optimal control, in addition to the near- 
monotone assumption, a blanket Lyapunov condition is imposed in [P7| Theorem 5.2] 
which renders all stationary Markov controls stable. In contrast, the RVI algorithm 
always converges to the optimal value function when initialized with some bounded 
initial value ipo . The reason behind the difference in performance of the two algorithms 
can be explained as follows: First, recall that the PIA algorithm consists of the 
following steps: 

1. Initialization. Set k — and select some stationary Markov control vq which 
yields a finite average cost. 

2. Value determination. Determine the average cost g Vk under the control Vk 
and obtain a solution V k to the Poisson equation 

a ij (x)d ij Vk + b i (x,v k (x))d i V k (x)+r(x,v k (x)) = g Vk , x e R d . 

3. Termination. If 

H(x,VV k ) = [b(x,v k (x)) ■ VV k (x)+r(x,v k (x))] a.e. , 

then return v k - 

4. Policy improvement. Select Vk+i G ilsM which satisfies 

Vk+i (x) G Arg min [b(x, u) ■ W fe (or) +r(x,u)] , x 6 M d . 

It is straightforward to show that if V is a solution to (|1.9|) whose growth rate does not 
exceed the growth rate of an optimal value function V from (jl.5p . or in other words 
the weighted norm ||V||y is finite, then ft — g and V is an optimal value function. 
It turns out that if the value function (fo determined at the first step k — does 
not grow faster than an optimal value function V then the algorithm will converge 
to an optimal value function. Otherwise, it might converge to a solution of (11.91) 
that is not optimal. However, the growth rate of an optimal value function is not 
known, and there is no simple way of selecting the initial control vo that will result 
in the right growth rate for (po . To do so one must solve a H JB-type equation, which 
is precisely what the PIA algorithm tries to avoid. In contrast, as we show in this 
paper, the solution of the RVI algorithm has the property that x h4 ip(t, x) has the 
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same growth rate as the optimal value function V, asymptotically in t. This is an 
essential ingredient of the mechanism responsible for convergence. 

The proof of convergence of (|1.7[) is facilitated by the study of the value iteration 
(VI) equation 

(1.10) dty{t,x) = a ij {x)d ij Tp{t,x) + H{x,VTp)- g, Tp(0, x) = <p Q (x) . 



The initial condition is the same as in (11 . T[) . Also g is as in (|1.4[) . so it is assumed 
known. Note that if <p is a solution of (11.7[) . then 

(1.11) Tp(t,x) = tp(t,x)- gt+ [ ip(s,0)ds, (t,x) e~R + xR d . 

Jo 

solves (|1.10[) . We have in particular that 

(1.12) 7p(t,x)-!p(t,0) = cp(t,x)-<p(t,0) \/xeM. d , Vt>0. 

It follows that the function / = tp — Tp does not depend on x G M. d and satisfies 

(1.13) ^+f = g-tp(t,0). 

Conversely, if Tp is a solution of (|1.10[) then solving (|1.13l) one obtains a corresponding 
solution of (|1.7p that takes the form [31 Lemma 4.4]: 

(1.14) tp(t,x) =Tp(t,x) - [ e s "*^(s,0)ds + £(l-e -t ), (t,x)eR+xR d . 

Jo 

It also follows from (|1.14[) that if t n- Tp(t, x) is bounded for each x 6 R d then so is 
the map t H y tp(t,x), and if the former converges as t — > oo, pointwise in x, then so 
does the latter. 

We note here that we study solutions of the VI equation that have the stochastic 
representation 

rt 



(1.15) Tp(t,x) =mf E% 



r(X s ,U s )ds + ip Q (X t ) 



gt. 



where the infimum is over all admissible controls. These are called canonical solutions 
(see Definition I3.10|) . The first term in (|1.15j) is the total cost over the finite horizon 
[0,<] with terminal penalty ipQ. Under the uniform geometric ergodicity hypothesis 
used in [3J it is straightforward to show that 1 1-)- Zp(t, x) is locally bounded in i S R d . 
In contrast, under the near-monotone hypothesis alone, t M- 7p(t,x) may diverge for 
each x € K d . To show convergence, we first identify a suitable region of attraction of 
the solutions of the HJB under the dynamics of (| 1 . T[) and then show that all w-limit 
points of the semiflow of (|1.7I) lie in this region. 

While we prefer to think of (11.71) as a continuous time and space relative value 
iteration, it can also be viewed as a 'stabilization of a quasilinear parabolic PDE 
problem' in analogy to the celebrated result of Has'minskh (see [H]). Thus, the results 
in this paper are also likely to be of independent interest to the PDE community. 
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We summarize below the main result of the paper. We make one mild assump- 
tion: let v* be some optimal stationary Markov control, i.e., a measurable function 
that satisfies (11.61) . It is well known that under the near- monotone hypothesis the dif- 
fusion under the control v* is positive recurrent. Let /i„» denote the unique invariant 
probability measure of the diffusion under the control v*. We assume that the value 
function V in the HJB is integrable under . 

Theorem 1.1. Suppose that the running cost is near-monotone and that the value 
function V of the HJB equation (jl.5[) for the ergodic control problem is integrable with 
respect to some optimal invariant probability distribution. Then for any bounded initial 
condition Lpo £ C 2 (R d ) it holds that 

lim tp(t, x) = V(x) - V(0) + g , 

t— >-oo 

uniformly on compact sets of M. d . 

We also obtain a new stochastic representation for the value function of the HJB 
under near-monotone costs which we state as a corollary. This result is known to hold 
under uniform geometric ergodicity, but under the near-monotone cost hypothesis 
alone it is completely new. 

Corollary 1.2. Under the assumptions of Theorem \l.l\ the value function V of 
the HJB for the ergodic control problem has the stochastic representation: 



V[x) - V(y) = lim inf E 



r(X a ,U„)ds 



inf El! 



r(X s ,U s )ds 



for all x, y £ R d . 

We would like to note here that in [7] the authors study the value iteration 
algorithm for countable state controlled Markov chains, with 'norm- like' running costs, 
i.e., min u r[x,u) — > oo as |x| — > oo. The initial condition ipo is chosen as some 
Lyapunov function corresponding to some stable control vq. We leave it to the reader 
to verify that under these hypotheses HVH^ < oo. Moreover they assume that (po is 
integrable with respect to the invariant probability distribution (see the earlier 
discussion concerning the PIA algorithm). Thus their hypotheses imply that the 
optimal value function V from (11. 5|) is also integrable with respect to . 

The paper is organized as follows. The next section introduces the notation used 
in the paper. Section [3] starts by describing in detail the model and the assumptions 
imposed. In i )3.2l we discuss some basic properties of the HJB equation for the ergodic 
control problem under near-monotone costs and the implications of the integrability 
of the value function under some optimal invariant distribution. In £ )3.3I we address 
the issue of existence and uniqueness of solutions to (jl.7p and (jl.lOp and describe 
some basic properties of these solutions. In ^3.41 we exhibit a region of attraction 
for the solutions of the VI. In fj4] we derive some essential growth estimates for the 
solutions of the VI and show that these solutions have locally bounded oscillation in 
Mr, uniformly in t > 0. Section [5] is dedicated to the proof of convergence of the 
solutions of the RVI, while concludes with some pointers to future work. 
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2. Notation. The standard Euclidean norm in Mr is denoted by | • |. The set of 
nonnegative real numbers is denoted by R+, N stands for the set of natural numbers, 
and I denotes the indicator function. We denote by x(A) the first exit time of a 
process {Xt , t G R+} from a set A C R d , defined by 

t(A) = inf {t > : X t £ A} . 

The closure and the boundary of a set A C K d are denoted by A and d A, respectively. 
The open ball of radius R in R d , centered at the origin, is denoted by Br, and we let 
x R = t(-Br), and x R = i{B c R ). 

The term domain in W 1 refers to a nonempty, connected open subset of the 
Euclidean space R d . For a domain D C K d , the space C k {D) (C°°(D)) refers to the 
class of all functions whose partial derivatives up to order k (of any order) exist and 
are continuous. 

We adopt the notation d t = Jj, and for i, j G N, <9; = J^- and 9y = Q§-g^r-- 
We often use the standard summation rule that repeated subscripts and superscripts 
are summed from 1 through d. For example, 

For a nonnegative multi-index a = (ai, . . . , ad) we let D a = • • • 9^ d . Let 
Q be a domain in M + x M d . Recall that C r ' k+2r (Q) stands for the set of bounded 
continuous functions tp(t, x) defined on Q such that the derivatives D a df(p are bounded 
and continuous in Q for 

\a\ + 2£ < k + 2r, £<r. 

In general if A 7 is a space of real- valued functions on Q, X\ oc consists of all functions 
/ such that f<p € X for every <p G C%°(Q), the space of smooth functions on Q with 
compact support. In this manner we obtain for example the spaces Cj' (R. d ) and 

wf^(Q). 

We won't introduce here the parabolic Sobolev space W r ' k+2r ' p (Q), since the 
solutions of (|1.7p and (| 1 . 10[) are in C, ' The only exception is the function ip 

in Theorem 14.71 and the function ipT used in the proof of Lemma 14.81 We refer the 
reader to [13] for definitions and properties of the parabolic Sobolev space. 

3. Problem Statement and Preliminary Results. 

3.1. The model. The dynamics are modeled by a controlled diffusion process 
X = {X t , t > 0} taking values in the d-dimensional Euclidean space M d , and governed 
by the ltd stochastic differential equation in (jl.ll) . All random processes in 
live in a complete probability space (fi,^, P). The process W is a d-dimensional 
standard Wiener process independent of the initial condition Xq. The control process 
U takes values in a compact, metrizable set U, and Ut(oj) is jointly measurable in 
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(t, lu) <E [0, oo) x fl. Moreover, it is non-anticipative: for s < t, W t — W s is independent 
of 

3s = the completion of a{X , U r , W r) r < s} relative to (#, P) . 

Such a process U is called an admissible control, and we let il denote the set of all 
admissible controls. 

We impose the following standard assumptions on the drift b and the diffusion 
matrix cr to guarantee existence and uniqueness of solutions to (11.11) . 
(Al) Local Lipschitz continuity: The functions 

b = [b\ . . . , b d ] J : R d x U i-» R d and a = [a ij ] : R d i-> R dxd 

are locally Lipschitz in x with a Lipschitz constant kr > depending on 
R > 0. In other words, for all x, y G Br and n£ll, 

\b(x, u) - b{y, u)\ + ||a(x) - o(y)\\ < k r \x - y\ . 

(A2) Affine growth condition: b and cr satisfy a global growth condition of the form 

\b(x,u)\ 2 + ||ct(x)|| 2 <ki(1 + M 2 ) V(a,«) e R d x U, 

where ||cr|| 2 = trace (crcr T ) . 
(A3) Local nondegeneracy: For each i? > 0, we have 

d 

»>j=i 

for alU = (&,...,&) £R d 
We also assume that b is continuous in (x,u). 
In integral form, is written as 

(3.1) X t =X + [ b(X s ,U s )ds+ f a(X s )dW s . 

Jo Jo 

The second term on the right hand side of (13.11) is an Ito stochastic integral. We say 
that a process X = {X t (uj)} is a solution of (jl.ip . if it is ^-adapted, continuous in t, 
defined for all u € ^ and t S [0, oo), and satisfies (|3.ip for all t € [0, oo) at once a.s. 

We define the family of operators L u : C 2 (R d ) i-> C(M d ), where u 6 U plays the 
role of a parameter, by 

(3.2) L u f{x) =a^{x)d lJ f{x) + V{x,u)d l f{x), ueU. 

We refer to L u as the controlled extended generator of the diffusion. 

Of fundamental importance in the study of functionals of X is Ito's formula. For 
/ G C 2 (R d ) and with L u as defined in j321), it nolds that 

(3.3) f(X t ) = f(X )+[L u >f(X 3 )ds + M t , a.s., 

Jo 
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where 

M t = f (Vf(X s ),a(X s )dW s ) 
Jo 

is a local martingale. Krylov's extension of the Ito formula [HI p. 122] extends p. 31) 
to functions / in the local Sobolev space Wf^.(M. d ), p > d. 

Recall that a control is called Markov if Ut — v(t,X t ) for a measurable map 
v : M. + x M d H> U, and it is called stationary Markov if v does not depend on t, i.e., 
v : R d i— > U. Correspondingly, the equation 

(3.4) X t = x + f b(X s ,v(s,X s ))ds + f a{X s )dW s 

Jo Jo 

is said to have a strong solution if given a Wiener process (Wt,$t) on a complete 

probability space (12,5", P), there exists a process X on (fi,#, P), with X n — xo £ R d , 

which is continuous, fo-adapted, and satisfies (|3.4[) for all t at once, a.s. A strong 

solution is called unique, if any two such solutions X and X' agree P-a.s., when viewed 

as elements of C([0, oo),!^). It is well known that under Assumptions (A1)-(A3), for 

any Markov control v, (13.41) has a unique strong solution [ID] . 

Let ilsM denote the set of stationary Markov controls. Under v £ Hsm, the process 

X is strong Markov, and we denote its transition function by P*(x, • ). It also follows 

from the work of 6, liJ] that under v £ Usm, the transition probabilities of X have 

densities which are locally Holder continuous. Thus L v defined by 

L v f(x) = a^(x)d t3 f(x)+b l (x,v(x)) d t f(x), v £ ii SM , 

for / s C 2 (M d ), is the generator of a strongly-continuous semigroup on Cf,(R d ), which 
is strong Feller. We let P^ denote the probability measure and the expectation 
operator on the canonical space of the process under the control v £ Hsm, conditioned 
on the process X starting from x £ M. d at t = 0. 

3.2. The ergodic control problem. We assume that the running cost function 
r : M d x U — > M + is continuous and locally Lipschitz in its first argument uniformly 
in u £ U. Without loss of generality we let kr be a Lipschitz constant of r over Br. 
More specifically, we assume that 

\r(x,u) -r(y,u)\ < k r \x - y\ Vx,y £ B R , Vu £ U, 

and all R > 0. 

As mentioned in [JI] an important class of running cost functions arising in practice 
for which the ergodic control problem is well behaved are the near-monotone cost 
functions. 

The ergodic control problem for near-monotone cost functions is characterized by 
the following theorem which we quote from [4] . Note that we choose to normalize the 
value function V* differently here, in order to facilitate the use of weighted norms. 

Theorem 3.1. There exists a unique function V* £ C 2 (M. d ) which solves the 
HJB equation (jl.5p . and satisfies min R d V* = 1. Also, a control v £ itsM is optimal 
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with respect to the criteria (|1.2[) and (j 1 .31) if and only if it satisfies (|1.6|) a.e. in M. d . 
Moreover, recalling that Xr = x(B c Fj ), R > 0, we have 



(3.5) V*(x) = inf E* 



' (r(X t ,v(X t )) -Q) dt + V*(X T{B c R} ) 



Vx G B C R , 



for all R >0. 

Recall that control v £ Usm is called stable if the associated diffusion is positive 
recurrent. We denote the set of such controls by Ussm, and let fi v denote the unique 
invariant probability measure on R d for the diffusion under the control v £ Ussm- 
Recall that v £ Ussm if and only if there exists an inf-compact function V £ C 2 (R d ), 
a bounded domain D C R d , and a constant e > satisfying 

L v V(x) <-e Vx£D c . 

It follows that the optimal control v* in Theorem 13. II is stable. 

We make the following mild technical assumption which is in effect throughout 
the paper: 

Assumption 3.2. The value function V* is integrable with respect to some optimal 
invariant probability distribution fj, v * . 

Remark 3.3. Assumption \3."A is equivalent to the following 01 Lemma 3.3.4]'- 
there exists an optimal stationary control v* and an inf-compact function V £ C 2 (R d ) 
and an open ball B C M. d such that 

(3.6) L v 'V(x) < -V*(x) Vx £ B c . 

For the rest of the paper v* £ Ussm denotes some fixed control satisfying (|1.6|) and 
(EH). 

Remark 3.4. Assumvtion \3.°A is pretty mild. In the case that r is bounded it is 
equivalent to the statement that the mean hitting times to an open bounded set are 
integrable with respect to some optimal invariant probability distribution. In the case 
of one dimensional diffusions, provided (x(x) > Oq for some constant ffo > 0, and 
limsupi^i^^ ^.2(^] < ~ \j then the mean hitting time of £ K is bounded above by a 
second-degree polynomial in x \15\ Theorem 5.6]. Therefore, in this case, the existence 
of second moments for fj, v * implies Assumvtion \3.S\ 

We need the following lemma. 

Lemma 3.5. Under Assumption WM, 

Vx £ R d , 



El'[V*(X t j\- >(X V *[V*} = f V*(x)^(dx) 

t^oo J Rd 



where, as defined earlier, fj, v * is the invariant probability measure of the diffusion 
under the control v* . Also there exists a constant m r depending on r such that 

(3.7) sup E V J [V*(X t )] < m r (V*(x) + 1) Vx £ R d . 

t>o 
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Proof. Since r is nonnegative, by Dynkin's formula we have 

(3.8) E v x "[V*(X t )]<V*(x) + gt Vt > , \/x e M. d . 

Therefore, since V* is integrable with respect to /z„* by Assumption l3.21 the first result 
follows by [T5J Theorem 5.3 (i)]. The bound in p.7[) is the continuous time analogue 
of (14.5) in [TB]. Recall that a skeleton of a continuous-time Markov process is a 
discrete-time Markov process with transition probability P = J °°a(dt)P', where a 
is a probability measure on (0, oo). Since the diffusion is nondegenerate, any skeleton 
of the process is ^-irreducible, with an irreducibility measure absolutely continuous 
with respect to the Lebesgue measure (for a definition of ^-irreducibility we refer the 
reader to [TBJ Chapter 4]). It is also straightforward to show that compact subsets of 
R d are petite. Define the transition probability P by 



Pf(x)= [ P(x,dy)f(y) 4 E v x '[f(X t )} 
for all bounded functions / G C(R d ), and 

r(X s ,v*{X s ))ds 



, x G 

t=i 



g r (x) = El 

Then (|1.5p translates into the discrete time Poisson equation: 
(3.9) PV*(x)-V*(x)=g-g r (x), x e R d 



x €R d . 



It easily follows from the near-monotone hypothesis (|1.8|) that there exists a constant 
e > and a ball B Ro C R d , Rq > 0, such that g r (x) — g > e for all x G B^. 
Since, in addition, J Rd V*(x)/j, v * (dx) < oo, it follows by [TBI Theorem 14.0.1] that 
there exists a constant m such that 



(3.10) ^\P n g r (x) ~ g\ < m(V*(x) + 1) Vie 

n=0 

By (j3"^|) - (|XTU|) we obtain 

n-l 

(3.11) P n V*(x) = V*{x) - Y.(P k 9r{*) - g) 

< (m + l)(V*(x) + l). 



By (|3.8[) and (|3.11l) , writing the arbitrary t G R+ as t — n + 5 where n is the integer 
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part of t and using the Markov property, we obtain 
K'[V*{X t )] =El'[E v ; s [V*(X t _ s )]~ 
= Ef [? n V*(X s )] 
<El'[(m + l)(V*(Xs) + l)} 

< (m+l)(V*(x) + gS+l) 

< (m+l)(V*(x) + g+1) vt>o,Vxem. d , 

thus establishing (|3 . T|) . □ 

Definition 3.6. We letCy(R d ) denote the Banach space of functions f <E C(R d ) 
with norm 

II JMI a / 1 2 -) 

x£R d V \ X ) 

We also define 

Q v * = {/ e C v * (R d ) n C 2 {R d ) : f > 0} . 



3.3. The relative value iteration. The RVI and VI equations in (11.71) and 
(ll.lOP can also be written in the form 

(3.12) &t<p(t,x) = min \L u tp(t,x) + r(x,u)] - <p(t,0) , tp(0, x) = w (x) , 

(3.13) dt<f(t, x) — min \L u Tp(t, x) + r(x, u)] — g, ^(0, x) = tpo(x) . 

Definition 3.7. Let v = {v t , t € R+} denote a measurable selector from 
the minimizer in (|3.13[) corresponding to a solution Tp £ C, ' (R d ). This is also a 
measurable selector from the minimizer in (|3.12[) . provided Tp and tp are related by 
(|1.11[) and (|1.14p . and vice-versa. Note that the Markov control associated with v 
is computed 'backward' in time (see (|1.15[) ). Hence, for each t > we define the 
(nonstationary) Markov control 

fl* = = se [o,t]}. 

^4Zso, we adopt the simplifying notation 

r(x, u) = r(x, u) — g . 



In most of the statements of intermediary results the initial data tp is assumed 
without loss of generality to be nonnegative. We start with a theorem that proves the 
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existence of a solution to p. 131) that admits the stochastic representation in (|1.15[) . 
This does not require Assumption 13.21 
First we need the following definition. 

Definition 3.8. We define R!f = (0,T) x R d , and let IRjf denote its closure. 
We also let Cy» (M. d ) denote the Banach space of functions in C(My) with norm 



\f\\v.,i 



sup 

(t,x)GRf 



\f(t,x)\ 
V*(x) 



Theorem 3.9. Provided ip Q S Oy , then 



(3.14a) 



W(t, x) = inf EJ: 



r(X s ,U s )ds + <p (X t ) 



is the minimal solution of (|3.13p in C^((0, oo) x R d ) PI C([0,oo) x R d ) which is 
bounded below on R^, for any T > 0. With £>* as defined in Definition \3.7\ it admits 
the representation 

(3.14b) ?(t,a;)=E* / r(X a ,vi(X s ))ds + <p (X t ) 

Jo 

and it holds that 

(3.15) [Tp{t - T fl , X TR ) I{t r < t}] > 

for all (t, x) G K + x WL d . Moreover Tp(t, ■) > —gt and satisfies the estimate 

(3.16) W\\v;t < (1 + gT) max (l, ||^ ||v) VT > . 



Proof. Let r n and <pQ, for n £ N, be smooth truncations of r and tpo, respectively, 
satisfying ||r n ||oo < n an d ||</5q ||oo < i an d such that r" t r an d Vo T as n —> oo. 
Let g n denote the optimal ergodic cost corresponding to r n . The boundary value 
problem 

d t <p%(t, x) = min [L u ^{t, x) + r(x, u)] in (0, T) x B B 
(3.17) ueV 

$%(0,x) = ip%(x) VxeB R , V%(t,-)\dB R = <P% Vte[0,T], 

has a unique nonnegative solution in C 1,2 ((0, T) x £r) n C([0, T] x Br) for all T > 
and i? > 0. This solution has the stochastic representation 

[* r n (X s ,U s )ds + ^{t-T R At,X TRAt ) . 
Jo 



(3.18) 



tp R (t, x) = inf E 



u sit 
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where, as defined in fJ21 x r denotes the first exit time from the ball Br. By (|3 . 1 8[) we 
obtain 



T R At 



r" (X s ,v*(X s )) ds + ^o(t - Tfl A t, X XrM ) 



<max (l,\\cp \\ v .)E v x ' 



T R At 



r(X a , v*{X s )) ds + V* (t — Xr A t,X rriM ) 



<max (l,\\(p \\ v *)(V* (x) + gt) . 

Therefore by the interior estimates of solutions of (|3.17[) (see [HI Theorem 5.1]) the 
derivatives {D a dfip^ : \a\ + 2£ < 2 , R > 0,n £ N} are locally Holder equicontin- 
uous in M T . Thus passing to the limit as R — > oo along a subsequence we obtain a 
nonnegative function tp n G Cj ' c (My) H C (My) , for all T > 0, which satisfies 

d t tp n (t,x) = min x) + r n (x, u)} in (0, oo) x R d 

(3.19) 

£„(0,a;) = <p$(x) VxeR d . 
By using Dynkin's formula on the cylinder [0, t] x Br, we obtain from (|3.19|l that 



(3.20) <p n (t,x) = inf Ki 



r"(X s , U s ) ds + fi n {t — Xr A t,X TRAt ) 



It follows by (|3~18l) that \\(p n (t, -)||oo < "(< + 1) for all n 6 N and t > 0. By (|3~20|) we 
have the inequality 



(3.21) fin{t,x)<E L x 



TfiAt 



r n (X s ,U s ) ds + 0„(t- xr A t, X TrM ) 



r n (X„U,)da + <pS(X t )I{xR>t} 



+ n¥ u c (x R <t) 



for all U E 51. Taking limits as R — > oo in (|3.2ip . using dominated convergence, we 
obtain 

rt 



/ r n (X s ,U s )ds + <p$(X t 
Jo 



U eil. 



(3.22) £ n (t,z)<E£ 
Note that 

(3.23) < <p n (t,x) < limsup <p%(t,x) < max (l, \\<p \\v*) (V*{x) + gt) . 

Hence, as mentioned earlier, the derivatives \D a d[(p n '■ |«| + 2^? < 2 , n € N} are 
locally Holder equicontinuous in (0, oo) x M d . Also as shown in [4J p. 119] we have 
g n — > g as n — > oo. Let {fcnjneN C N be an arbitrary sequence. Then there exists 
some subsequence {k' n } C {k n } such that tp k > n —ttpG C 1 1 '^(M T )nC(M T ), for all T > 0, 
and (p satisfies 

d t (p{t, x) = min [L u (p{t, x) + r(x, u)] in (0, oo) x R d 

(3.24) 

<p(0,x) = <p Q (x) \/x£R d . 
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Let £>* denote a stationary Markov control associated with the minimizer in (I3.24[) as 
in Definition 13.71 By using Dynkin's formula on the cylinder [0, t] x B R , we obtain 
from (pOIj) 



(3.25a) 
(3.25b) 



tp(t, x) = inf Ki 
ueix 



r(X s , U s ) ds + tp(t — Xr A t, X XR At) 



!p{t, x) = Ef. 



r(X s , vl(X s )) ds + (p{t - x R A t, X TRAt ) 



Since tp(t, ■ ) is nonnegative, letting R — > oo in p.25bp . by Fatou's Lemma we obtain 



(3.26) 



0(t,x)>El 



r(X s ,vl(X s )) ds + <p (X t ) 



r(X s ,U s )ds + <p (X t ) 



Taking limits as n — > oo in (I3.22[) , using monotone convergence for the first term 
on the right hand side, we obtain 



(3.27) £(i,*)<l£ 
By (f3T26|) ~ ff3T27|) we have 
(3.28a) 

(3.28b) 



r(X s ,U s )ds + <p (Xt) 



Weil. 



<p(t, x) = inf W x 



r(X s ,U s )ds + ^(X t ) 
r(X s ,vl(X s )) ds + ip (X t ) 



Let Tp(t,x) = (p{t,x) - gt. Then Tp solves (gj3| and (|3.14a|) - (|3.14bj) follow by 
(|3.28al) (|3.28bl) . It is also clear that Tp(t, x) > — gt, which together with Q3.23|) implies 
(l3~T6l) . 

By (|3.25a[) we have 



(3.29) £(f,s)=E* 



+ Ef [$(t-x R ,X TR )I{x R <t}] 



The first term on the right hand side of p.29j) tends to the right hand side of (|3.28b|) 
by monotone convergence as R f oo. Therefore (|3.15j) holds. 

Suppose tp is a solution of (|3~24| in Cf£ (Kif,) n C (R£) , for some T > 0, which is 
bounded below, and v is an associated stationary Markov control from the minimizer 
of (I3.24j) . Applying Dynkin's formula on the cylinder [0,t] x B R and letting R — > oo 
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using Fatou's lemma, we obtain 



r(X s ,vl(X s )) ds + MXt) 



u 



> inf e: 
ueii - 

> (p(t,x) . 



r(X s ,U s )ds + MX) 



Therefore Tp(t, x) is the minimal solution of (|3~T5)) in £^((0, oo) xR d ) nC([0, oo) xR d ) 
which is bounded below on R^, for each T > 0. □ 

In the interest of economy of language we refer to the solution in (|3.14a[) as 
canonical. This is detailed in the following definition. 

Definition 3.10. Given an initial condition ipo G Oy* we define the canonical 
solution to the VI in (|3.13jl as the solution which was constructed in the proof of 
Theorem \3.9\ and was shown to admit the stochastic representation in (|3.14a[) . In other 
words, this is the minimal solution of ()3.13j) in C lo ' c ((Q, oo) x R d ) n C([0,oo) x R d ) 
which is bounded below on Rj,, for any T > 0. The canonical solution to the VI well 
defines the canonical solution to the RVI in (|3.12j) via (|1.14j) . 

For the rest of the paper a solution to the RVI or VI is always meant to be a 
canonical solution. In summary, these are characterized by: 



(3.30) 



(p(t,x) 



[ ip(s,0)ds = inf E*r 
Jo u ^ 



r{X Sl U s )ds + <p (X t ) 



Similarly 



ifi(t, x) = inf E„ 



Ef [r(X a ,v t .(X a ))]6s + ^ t [<po(Xt)] 



r(X s ,U s )ds + <p (Xt) 



Ef [r(X a ,vl(X a ))]ds + E? [tpo&t)] 



The next lemma provides an important estimate for the canonical solutions of the 
the VI. 

Lemma 3.11. Provided tpo <E Cy*(R d ) fl C 2 (R d ), then the canonical solution 
V e C \oc ((°) °°) x Rd ) n oo) x Rd ) °f |3T3l) satisfies the bound 



(3.31) 



it [MX) V*(X t )} < rp(t,x) - V*(x) < El [MX) ~ V*(X t )] 



for all (t,x) £ R+ x R d . 

Proof. By JO} and J3T3} we obtain 



-d t (V* -Tp) + L v (V*-^)<0 



16 



ARI ARAPOSTATHIS, VIVEK S. BORKAR AND K. SURESH KUMAR 



and 

-d t {V* ~T5) + L»\v* - p) >0 

from which, by an application of Ito's formula to V*(X S ) — Tp{t — s, X s ), s G [0, t), it 
follows that 



and 



El'[V*(X t )-<p (X t )] <V*(x)-lp(t,x) 



Ef [V*(X t ) - <po(Xtj\ > V* (x) - Tp{t, x) , 

respectively, and the estimate follows. □ 

Concerning the uniqueness of the canonical solution in a larger class of functions, 
this depends on the growth of V* and the coefficients of the SDE in . Various 
such uniqueness results can be given based on different hypotheses on the growth of 
the data. The following result assumes that V* has polynomial growth, which is the 
case in many applications. 

Theorem 3.12. Let ipo G Oy» and suppose that for some constants c\, C2 and 
m > 0, V*(x) < ci +c 2 \x\ m . Then any solution Tp' G C^(R^)nC(l£) of (|3~T3l) . for 
some T > 0, which is bounded below in KL and satisfies \\Tp'\\v*,T < 00 agrees with 
the canonical solution Tp on R!^ . 

Proof. Let Tp? be a solution satisfying the hypothesis in the theorem, and let 
Tp be the canonical solution of (j3. 13[) and v* the associated Markov control as in 
Definition 13.71 Let Tp e , for e > 0, denote the canonical solution of (13.13[) with initial 
data (po + eV* and v e the associated minimizer. By Theorem 13.91 for each e > we 
obtain 

rt 



tp r (t, x) = inf E_ 
veil 



r(X s ,U s ) ds + <po(X t ) + eV*(X t ) 



> -gt + e inf E^ 



f r(X s ,U s )ds + V*(X t ) 
Jo 



> eV*{x) - gt. 
Therefore by (|3.15[) for each e > 0, we have 
F$[V*(t-x R ,X TR )I{r R <t}] 



R— too 



>0 V(t,a;)e 



which in turn implies, since Hv'Hv-.t < oo, that 
(3.34) E< [Tp'(t ~ T fl , X XR ) 1{t r < t}] V(t, x) G 



R— >oo 



pd 
T ■ 



Since —dtp 1 + L v ^p' + r(x, v e .t(x)) > 0, we have that for all (t, x) G K^, 



(3.35) 



Tp'(t,x) <EZ« 



r(X s , v*(X B )) ds + p'{t - x R A t, X raM ) 
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and taking limits as R — >• oo in (|3.35j) . using (|3.34[) . it follows that Tp' < (p e on MA,. 

The polynomial growth of V* implies that there exists a constant m(x,T) such 
that E%[V*(X t )] < m(x,T) for all (t,x) E R| and U E il [4, Theorem 2.2.2]. There- 
fore, since 



(3.36) Tp s (t,x)< 



r(X s ,vl(X s )) ds + ipo(X t ) + eV*(X t ) 



< Tp(t, x) + em(x, T) V(t,i)e4, 

and Tp e > Tp, it follows by (|3 .36[) that Tp e Tp on R^ as e I 0. Thus Tp' < Tp on Kj,, 
and by the minimality of Tp we must have equality. □ 

We can also obtain a uniqueness result on a larger class of functions that does 
not require V* to have polynomial growth, but assumes that the diffusion matrix is 
bounded in R d . This is given in Theorem 13. 131 below, whose proof uses the technique 
in0. 

We define the following class of functions: 

© = {/ £ C 2 (R d ) : lim f(x) C - k ^ 2 = , for some k > 0} . 

\x\— >oc 

Theorem 3.13. Suppose V* E 25 and that ||cr|| is bounded in Mr. Then, provided 
ipo E Oy«, t/iere exists a unique solution ip to (|3.13j) smc/i £/iat max tg [ 0j T] VK*; ") G © 
/or eac/i T > 0. 

Proo/. Let ^ e ^((0,00) x R d )nC([0, 00) x R d ) be the minimal nonnegative 
solution of 

(3.37) d t 0(t,x) = min [L u (p{t,x) +r(x, u)} in(0,oo)xR d , 

ip(0,x) = (po(x) VxeR d , 

and let {v t , i £ R+} denote a measurable selector from the minimizer in (|3.37p . 
Suppose that <p E ^((0, 00) xR d )nC([0, 00) xR d ) is any solution of (f3T37|) satisfying 
the hypothesis of the theorem, and let {vt , t E R + } denote a measurable selector from 
the corresponding minimizer. Then / = tp — ip satisfies, for any T > 0, 

(3.38) d t f-L cT f<0 and dtf-L^f^O in (0, T] x R d , 

and /(0,a;) = for all x E R d . By (|3T6l) . the hypothesis that V* E ©, and the 
hypothesis on the growth of /, it follows that for some k — k(T) > large enough 

(3.39) lim max |/(t,a;)|e- fe|;l:|2 =0. 
v ; M-s-oo te[o,T] 1 v y 

It is straightforward to verify by direct computation using the bounds on the coef- 
ficients of the SDE that there exists 7 = 7 (fc) > 1 such that g(t,x) = e ^+ft)(i+k\x\ 2 ) 
is a supersolution of 

(3.40) d t g - L dT °g > in (0, T ] x R d , with T = 7 _1 . 
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By (I3.39[) . for any e > we can select R > large enough such that \f(t, x)\ < eg(t, x) 
for all (t,x) G [0,7 _1 ] x dB R . Using ([535]) . (pO0)l and Dynkin's formula on the strip 
[0,7 _1 ] x B R it follows that |/(i, x)\ < eg(t,x) for all (t,x) G [0,7 _1 ] x .Br. Since 
e > was arbitrary this implies / = 0, or equivalently that / = (p on [0, 7 _1 ] x R d . 

Since, by (|3.16|) . (p(^~ l , ■ ) G Ov* , we can repeat the argument to show that / = (p 
on [7 _1 , 27" 1 ] x R d , and that the same holds by induction on [n7 _1 , (n + 1)7 _1 ] x R d , 
n = 2,3,..., until we cover the interval [0, T]. This shows that / = (p on R^, and 
since T > was arbitrary the same holds on [0, oo) x R d . □ 

We do not enforce any of the assumptions of Theorem 13.131 in the rest of the 
paper. Rather our analysis is based on the canonical solution to the VI and RVI 
which is well defined (see Definition 13. 10[) . 

3.4. A region of attraction for the VI algorithm. In this section we describe 
a region of attraction for the VI algorithm. This is an subset of C 2 (R d ) which is 
invariant under the semiflow defined by (|3.13[) and all its points are convergent, i.e., 
converge to a solution of (|1.5|) . 

Definition 3.14. We let~$ t [<p ] : C 2 (R d ) -> C 2 (R d ), t e [0,oo), denote the 
canonical solution (semiflow) of the VI in (|3.13[) starting from (p$ , and 3>t[</?o] denote 
the corresponding canonical solution (semiflow) of the RVI in (|3.12[) . Let £ denote 
the set of solutions of the HJB in (|1.5j) . i.e., 

£ = {V* + c:cgM}. 

Also for c€l we define the set Q c C C 2 (R d ) by 

G c = {heC 2 (R d ) :h-V*>c, \\h\\y < oo} . 

We claim that for each c G R, Q c is invariant under the flow <&(. Indeed by (|3.7|) 
and (|3.3ip , if (po G Q c , then we have that 



Since translating ipo by a constant simply translates the orbit <I>t[<po], without loss of 
generality we let c = 0, and we show that all the points of Go are convergent. 

Theorem 3.15. Under Assumvtion \3.2\ for each ipo G Go the orbit $t[y?oL an< ^ 
therefore also $t [(po\ , converges as t —s- oo to a point in £ H Go- 

Proof. Since, as we showed in the paragraph preceding the theorem, $t[<^o] £ Go 
for all i > 0, by (|3.14al) we have 



c < *tWW - V*(x) < El" [Mx t ) - V*(X t )] 

<m r \\Lpo-V*\\ v ,{V*(x) + l) V(t,x) GM+ xR d . 



(3.41) 
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Since ^[^oK^) — V*(x) > 0, and J Rd & t [ipo](x) (j, v *(dx) is finite by Assumption 
it follows by integrating (|3.41j) with respect to fi v * that the map 

(3.42) t^ I $ t [(p.o]{x)Li v ,(dx) 

is nonincreasing and bounded below. Hence it must be constant on the w-limit set of 
ipo denoted by u((po). Let h £ u(<po) and define 

(3.43) f(t,x) 4 -a@ t [h](x) + L v '(§ t [h](x)-V(x)). 

Then f(t,x) > for all (t,x), and by applying Ito's formula to (|3.43|) . we obtain 

(3.44) $ t [h](x)-V*(x)-E v J [h(X t )-V*{X t )] = -K* If f(t-s,X s )ds . 

Jo 

Integrating p.44j) with respect to the invariant distribution \i v * we obtain 

(3.45) [ ($ t [h](x)-h(x))n v *(dx) = - [ [ f(t-s,x)n v *(dx)ds Vi>0. 

JM d JO JR d 

Since the term on the left-hand-side of (|3.45p equals 0, as we argued above, it follows 
that f{t,x) = 0, (t,x) — a.e., which in turn implies that 

lim $ t [h](x) = V*(x)- [ (V*(x) - h(x)) fi v *(dx) . 

t-HX> J Rd 

It follows that cj((fo) C £ n Go and since the map in fl3.42[) is nonincreasing, it is 

straightforward to verify that ui(ipo) must be a singleton. □ 

We also have the following result which does not require Assumption 13.21 
Corollary 3.16. Suppose ipo 6 C 2 (R d ) is such that ipo — V* is bounded. Then 

$t[yo] converges as t — » oo to a point in £. 

Proof. By (|3.3ip . under the hypothesis, x 7p(t, x) — V* (x) is bounded uniformly 

in t. Thus the result follows as in the proof of Theorem 13. 151 □ 

4. Growth Estimates for Solutions of the Value Iteration. Most of the 
results of this section do not require Assumption 13.21 It is only need for Lemma 14.101 
and Corollary 14.111 Throughout this section and also in ij5] a solution Tp (<p) always 
refers to the canonical solution of the VI (RVI) without further mention (see Defini- 
tion Gnu. 

Lemma 4.1. Suppose <po £ Oy. Then 

7 W, x) > . 
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Proof. Since [[vollv* < oo it follows that jE^ [(p (X t )} — ► as t — > oo (see [H 



Lemma 3.7.2 (ii)]), and so we have 

< liminf - inf E 1 ' 

i^oo t U£iX 



v ■ f V& x ) / r 
limmt < limsup 

t->oo t t->oo t 



r(X s ,U s )ds + MXt) 
Tp(t, x) 



1 



< limsup 

t— »oo t 

= 0. 



r(X s ,v*(X s ))ds + <p {X t ) 



The first inequality above uses the fact that (po is bounded below and that g is the 
optimal ergodic cost. □ 

Lemma 4.2. Provided ||v?ol|oo < oo, it holds that for all t > 

Jp(t - t, x) - Tp{t,x) < qt + osc tp Q VxeR d , VVe[0,i]. 



Proof. We have 



tp(t — t, x) — (fi(t, x) = inf 
ueii 



r{X s ,U s )ds + w(X t - T ) 



inf 

ueu 



r(X s ,U s )ds + <p {X t ) 



< - inf 



U£iX 

< g t + osc ipo 

R d 

Definition 4.3. We define: 



MXt) ~ MXt-r) + I r(X s , Us) ds 

t-T 



K, = {x e R d : min r(x, u) < o) 



Let Bo be some open bounded ball containing JC and define f = t(Bq). Also let So > 
be such that r(x, u) > g + 5q on Bq. 

Lemma 4.4. Suppose tpa £ Oy». Then it holds that 



(4.1) 

and 

(4.2) 



<p{t,x) <K 



r(X s , v*(X s )) ds + Tp{t - X A t, X* At ) 



TAt 



f (X s , v\ (X s )) ds + Tp(t - t A t, AT* At ) 
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for all x G Bq . 

Proof. Let Br be any ball that contains Bo and for n G N, let x n denote the first 
exit time from Using Dynkin's formula on (|3.13p . we obtain 



(4.3) wit, x) = inf E^ 
uen 



for x G Bq. By (1431) we have 



r{X„, U s ) ds + cp(t - x A T n A f, Xf AT „ At ) 



(4.4) v?(i,x)<E* 



t ATjj At 



r(X s ,v*(X s ))ds 



- p Ef [f A x„ A t] 



+ Ef [Tp(t — T A T„ A t, X ¥ATnAt )] . 

We use the expansion 

Ef [?(t - t A x„ A t, X* ATnAt )] = K" [W - * A f, X f Ai ) I{t„ > x A *}] 

+ Ef [<p(t - t„, X Tn ) I{t„ < f A i}] . 
By (|3.16p and the fact that, as shown in [4, Corollary 3.7.3], 

K[v*(x r ji{x n <t}] >o, 

n—>oc 

we obtain 

Ef hp(t - X n , X r „ ) I{T„ <TAt}] > . 

n— >oo 

Therefore by taking limits as n — > oo in (|4.4I) and also using monotone convergence 
for the first two terms on the r.h.s., we obtain (|4.ip . 
To obtain a lower bound we start from 



(4.5) Tp(t,x)= El 



r(X s ,vl(X s )) ds + tp(t -x Ax n At,X? ArnAt ) 



Since for any fixed t the functions {ip(t — s,x) : s < t} are uniformly bounded below, 
taking limits in (|4.5j) as n — >• oo, we obtain (|4.2j) . □ 

Lemma 4.5. Suppose cpo G 0y». XTien /or ani/ 1 > we /lave 



x) > min min_ tp , min <po ) Vi £ B!j . 

\[0,t]xB o Rd 



Proof. Let a: be any point in the interior of Bg. By (|4.2j) we have 



>E* 



TAt 



r (X s , «J (X s )) ds + Tp(t — f A t, X f At ) 



> <5 [t A t] + P* (t < t) min_ p + P* (f > t) min (p ■ 

[0,4] X Bo Rd 
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and the result follows. □ 

Remark 4.6. By Lemma \4-5\ */ i n f[o oo)x"b ¥ ^ then Tp is bounded below 

on [0,oo) x M. d . If this the case, the convergence of the VI and therefore also of the 
RVI follows as in the proof of Theorem \3.15\ Therefore without loss of generality we 
assume for the remainder of the paper that infj oo \ x -g tp = —00. By Lemma \4-5[ 
this implies that there exists T > such that 

Tp(t, x) > min_ Tp > To , Vx e Bq . 

[0,i]xB 

We use the parabolic Harnack inequality which we quote in simplified form from 
the more general result in [HI Theorem 4.1] as follows: 

Theorem 4.7 (Parabolic Harnack). Let B 2 r C K d be an open ball, and tp be a 
nonnegative caloric function, i.e., a nonnegative solution of 

d t tp(t,x) - a l3 (t,x)d lJ ib(t,x) + b i (t,x)d i ip(t,x) = on [0,T] x B 2R , 

with a lJ (t,x) continuous in x and uniformly nondegenerate on [0,T] x B2R, and a 10 
and b l bounded on [0, T] x B 2 r. Then for any r 6 (0, T /4] , there exists a constant Ch 
depending only on R, t, and the ellipticity constant (and modulus of continuity) of 
a lJ and the bound of a l] and b l on B 2 r, such that 

max ib < Ch min ib . 

[T-3t,T-2t]xB r [T-t,T]xB r 

In the three lemmas that follow we apply Theorem 14.71 with r = 1 and B' = 2Bq. 
Lemma 4.8. There exists a constant Mq such that 

max Tp— min tp<Mo + CH\ min Tp— min Tp) VT>To+4. 

[T-3,T-2]xB [0,T]xB o \[T-l,T]xB [0,T]xB o > 

Proof Let ib T {t,x) be the unique solution in W I 1 ^' P ((Q,T) x B' ) nC([0,T] x ~B~' Q ) 

of 

d t ifr(t,x) -a l} {x)d lJ ip T {t 1 x) -b l (x,vJ{x))d^ T {t,x) = on [0,T] x B' , 
ifrQb, x) = Tp(t, x) on ([0, T] x dB' ) U ({0} x %) . 
Since ib T = ib? — tp satisfies 

d t i> T (t,x) - a lj (x) dijip T (t,x) - b l (x, vj{x)) ditp T (t, x) +r(x,vf (x)) = 
on [0,T] x B' , and 

$r(i, x) = on ([0, T] x dB' Q ) U ({0} x %) , 
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it follows that there exists a constant Mq which depends only on B' (it is independent 
of T) such that 



(4.6) 



sup \ip T \<M VT>0. 

[0,T]XB' 



Indeed this is so because with T(i? ) denoting the first exit time from B' and with v T 
as defined in Definition 13.71 we have 









\ip T (t,x)\ = 


K 


[L 









iX s ,vT_ t _ s (X s ))ds 



< sup 

ueu 



\r(X s ,U s )\ds 
< \\r\\co,B' sup sup E%[t(B' )] < oo : 



since the mean exit time from B' is upper bounded by a constant uniformly over all 
initial x € B' and all controls U 6 il by the weak maximum principle of Alexandroff. 

Let (t,x) be a point at which p attains its minimum on [T — 1,T] x Bq. By 
Lemma |4~51 and Remark l4.6l the function (t,x) M> ipx(ti x ) ~ mm [o,T] xb ~<P is nonneg- 
ative on [T — 4, T] x B' 0: and by Theorem 14 .7! we have 



(4.7) ip T (t,x)- min Tp < C H (ipr(i, x) 

[0,T] x B V 



mm ip 

[0,T]xS o 



< Cff ( ^ T (t, x) + min <p — min tp) 

V [T-l,T]xB [0,T]xB o / 

for alH e [T — 3,T - 2] and x £ B ■ Expressing the left hand side of (|4~7|) as 

Tp(t,x) — min W + ib T (t, x) , 

[0,T]xB o 

and using (|4.6|) . Lemma T4.8I follows with 

M ^ (C H + 1)M • □ 
Lemma 4.9. Provided po G C 2 (R d ) is nonnegative and bounded, we have 
<p(t,x) - max ^(t, • ) < 2 ||^o||oo + (l + gS^V^x) Vx e B C Q . 

dBo 



Proof. By Lemma [ 

(4.8) Tp(t - T, x) < Tp(t,x) + QT + OSC po Vx £ Bq , 0<T<t. 
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Therefore by (|4.1I) and (|4.8|) . using the fact that r > on Sg, we obtain 



(4.9) <p(£, x) < 



r(X s ,v*(X s )) ds + Tp{t - f A t, X iAt ) 
r(X s ,v*(X s ))ds 



+ K [ip(t-^Xt)i{T <£}] 

+ K' [MXt)I{r>t}} 
<V*(x)+K[7p(t,X,)I{T<t}] 

+ QE V X " [Tl{T < t}] + OSC ifo + Halloo 

<7»+Pf ({f <£}) (max ¥>(£,•)) 

+ gEf [*!{*<*}] + 2 Moo, 

for x G Bq. Since — ^j(£, a;) < gt, we have 

(4.10) - Pf ({* > 0) (max Tp(t, • )) < gPf ({f > t}) t 

< qE%' [tI{t> £}] . 

Hence subtracting maxgs ^?(£, • ) from both sides of (|4.9[) and using (|4.10[) together 
with the estimate [f] < Sq 1 V*(x), we obtain 

<p(t,x) - max 7p(t, ■)<V*(x) + g5^V*{x) + 2 ||^ ||oo • □ 

dB 

We define the set T C R+ by 

7" = < £ > To + 4 : min <5 = min <2? > , 

I [t-Lt]xfl [0,t]xB o > 



where To is as in Remark 14.61 By Remark 14.61 T ^ 0. 

Lemma 4.10. Let Assumption Iff.ffl hold and suppose that the initial condition 
ifo € C 2 (R d ) is nonnegative and bounded. Then there exists a constant Cq such that 

osc Tp(t, • ) < C V£ > . 

So 



Proof. Suppose £ £ T. Then, by Lemma 



max wit — 2, • ) — min tp < Mq . 

dB [0,t]xB n " 



Therefore, by Lemma T4. 91 we have 



(4.11) Tp(t-2,x)- min Tp < M +2 ||( y 5 ||oo + (l + e<5 " 1 )V r *(a;) 

[0,i] xB 



V(£,x) e TxB n c . 
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Next, fix any t e T ■ It suffices to prove the result for t > t since it trivially holds 
for t in the compact interval [0,fo]- Given t > to let t = sup Tn [to,t]. Note then 
that 



(4.12) 



mm ip = mm ld . 

[0,t]xB [0,t]xB o 



By (|4.11l) - (|4.12l) . and since V* is nonnegative, we obtain 

rt-T+2 



(4.13) sup <p(t,x) < sup E^ 

xEB xEBq 



< sup E^ 

x£B 



r(X s , v*(X s )) ds + Tp(r - 2, AV T+2 ) 



t-T+2 



r(X s ,v*(X s ))ds + V*(X t - T+2 ) 



with 



+ sup K [^(r-2,X t _ T+2 )] 
xeB 

<\\V*\\ oo ,b + min Tp + Mo + 2 l^olloo + Q So 1 K , 

[0,t]xB o 



Ko = sup sup E v J[V*(X t )] 

t>0 xeBo 



By Lemma l3.5[ Ko is finite. Since 



osc ip(t, ■ ) < sup tp(t,x) — min ip . 

B a x£B [0,t]xB o 



and t > to was arbitrary, the result follows for all t > to by (|4.13[) . □ 
The following corollary now follows by Theorem 14.71 and Lemma 14.101 
Corollary 4.11. Under the hypotheses of Lemma \4-10\ for any r > there 

exists a constant C(r) such that 



osc ip < C(t) VneN. 

[nT,(n+l)r] xB 



5. Convergence of the Relative Value Iteration. We define the set % C K+ 

by 

% = {t eR+ :Tp(t,0) <ip(t',0) W<t}. 
In the next lemma we use the variable 

Lemma 5.1. Let Assumvtion \3.2\ hold and also suppose that the initial condition 
(po £ C 2 (M. d ) is nonnegative and bounded. Then 



(5.1a) >P(t,x) <C + 2||^ol|oo + (l + e^ 1 ) 1/ *( a; ) V(t, x) £ R+ x R d , 
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and there exists a constant Mq such that 



(5.1b) 



lp(t, 0) - <f(t', 0) < M \ft>t'. 



Proof. The estimate in (|5.1aj) follows by Lemmas 14.91 and 14.101 To show (I5.1b[) 
note that 



(5.2) 



<f(t, 0) - Tp(t', 0) < Tp(t, 0) - min Tp(s,0) We[0,il 

sG[0,t] 



Let t* e Argmin sg[M <p(s,0) and define T = t — t*. Clearly, t* = t - T E %■ We 
have 



(5.3) 7p(t, 0) - <p(* — T, 0) < 



r(A s , V *(X s )) ds + <^-T,A T ) 



^(t-T,0) 



r (X a , v*{X s )) ds + &(t - T, X T ) 



= V* (0) - Eq* [V* (X T )] + E V Q * [¥(t - T, X T )] 
< V*(0) + C + 2 police + Q So 1 K' [V*(X T )] , 



where the last inequality follows by (|5.1ap . However, by Lemma [3.51 there exists a 
constant M such that 



sup E v [V*(X T )}<M . 

T>0 



It then follows by (I5.3[) that (p(t,0) — ip(t — T, 0) is bounded above by a constant 
independent of t and T. The result then follows by (15.21) . □ 



Lemma 5.2. Under the hypotheses of Lemma \5.1\ there exists a constant ko > 
such that 



E*' [t A t] < k + 26^(1 + q5^)V*{x) Vx e B C Q . 



Proof. Subtracting ip(t, 0) from both sides of (|4.2|) . we obtain 



f{X s ,v\{X s )) ds + V (t - t A t, Xt M ) I{t < t} + <po(Xt) I{t > t} 



<p(t, 0) I{t > t} + - f A t, 0) - 0)) I{t < t} 
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We discard the nonnegative term <p (X t ) I{t > t}, and we use Lemma ft.lOl and (I5.1bl) 
to write the above inequality as 
rtAt 



(5.4) f(t,x)>E v x 



r(X s ,«*(X s ))ds 



- SUP ||!f r (s,-)l|oo,Bo 
0<s<t 



- Ef [?(t, 0) I{t > *}] + Ef [{fp{t - f A t, 0) - 0)) I{t < t}] 



>E" 



r(X s ,^(X s ))ds 



- Co - W, 0) Pf ({* > t}) - Mo . 



By (|5.1al) and (|5.4p we obtain 

C + 2 || ip 1| + (1 + q Sq 1 ) V* (x) > S Ef [t At] — <p(t, 0) pf ({t > <}) 

- C7 - Mo 

> ^ Q _M) E f [* A t]-Co-Mo. 

The result then follows by Lemma |4~T1 □ 

Lemma 5.3. Under the hypotheses of Lemma \5.1[ 

w(t,o)€ ({*>*» - — >o, 

t— >oo 

uniformly on x in compact sets of K d . 

Proof. By Lemma 14.11 and Lemma 15.21 we have 

<?(t,0)Pf ({* ><}) < ^^(fco + a^Hl + e^ 1 )^*^)) ^0 Va;eB c . 



□ 



Lemma 5.4. Le£ Assumption \3. e A hold and also suppose the initial condition 
ipo € C 2 (R d ) is nonnegative and bounded. Then the map t i— <p(t,0) is bounded on 
[0, oo), and it holds that 

— osc ip < liminf ip(t, 0) < limsup <p(t, 0) < M + g . 



Proof. Define 



By (j3~30l) we have 



g(t) = inf E^ 



r{X s ,U s )ds + <p (X t ) 



(5.5) ^,0) = <?(*)- / e-^CaJdn 

Jo 



= (l-c- t )- 1 /V'^-^ds 
Jo 

+ {l-e- t y 1 e- t f e s ' t g(s)ds, 
Jo 
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for t > 0. By Lemma IS~T1 g(t) < M + <fo(0) + gt. Therefore the second term on the 
right hand side of (|5.5[) vanishes as t — > oo. By Lemma FL~2l g(t) — g(s) > — osc R d (po 
for all s < t. Also, by Lemma I5TT1 g(t) — g[s) < Mo + g(t — s) for all s <t. Evaluating 
the first integral on the right hand side of (|5.5[) we obtain the bound 



(5.6) 



osc ipo < I e" 
lo 



-'(git) -g(s))ds < M + g Vt > . 



The result follows by (|5~51) - ([5Td| . □ 

Combining Corollarv l4.111 the boundedness of 1 1— > (p(t, 0) asserted in Lemma IBT41 
and (|1.12|) . it follows that x n> tp(t,x) is locally bounded in M. d , uniformly in t > 0. 
Recall Definition 13.141 The standard interior estimates of the solutions of (|3.12|) 
provide us with the following regularity result: 

Theorem 5.5. Under the hypotheses of Lemma \5.4\ the closure of the orbit 
{$t [tpo] , t e R + } is locally compact in C 2 (R d ). 

Proof. By Corollary 14 . 1 1 1 and Lemma [A.10\ the oscillation of Tp is bounded on any 
cylinder [n,n + 1] x Br, uniformly over n G N. This together with Lemma 15 .41 imply 
that <J> t [</?o](2;) is bounded on (i, x) G [n,n + 1] x Br, for any R > 0, uniformly in 
n G N. It follows that the derivatives dij$t[<Po] are Holder equicontinuous on every 
ball Br uniformly in t [14] Theorem 5.1]. The result follows. □ 

We now turn to the proof of our main result. 

Proof of Theorem Let {t n } be any diverging sequence and let / be any 

limit in in the topology of Markov controls (see [H Section 2.4]) of {v tn } along some 
subsequence of {t n } also denoted as {t n }. By Fatou's lemma and the stochastic 
representation of V* in Theorem 13. 11 we have, 



(5.7) liminf E**" 



r(X s ,vl"{X s )) ds 



> E 



r(X s ,f s (X s ))ds 



> inf El 

"6HSSM 



r(X s ,v(X s ))ds 



> y*(a:) - IIHIco.Bo VxG5 c . 



The second inequality in (|5.7p is due to the fact that the infimum of 



E 



f(X s ,U s )) ds 



over all U G it is realized at some v G .Ussm, while the third inequality follows by 
(|33|) . Therefore, by (fTT2|) . ([5~4)) . (j5~7| and Lemmas [Ol and I5~4l we have that 

(5.8) liminf ip(t,x) = liminf (&(t,x) +<p(t,0)) 



> V*(x) - |mioo,B - C Q - M - osc ipo Vx G B c 
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Also, by (|5.1ap and Lemma [5751 we obtain 

(5.9) limsup ip(t,x) = limsup (&(t, x) + ip(t, 0))tf /, (i, x) 

< Co + 2 Halloo + (1 + q5^)V*{x) + M Q + g 

for all (t, x) G M.+ x R d . 

Hence, by ([Q ^ - (j5"79"]) if we select 

c = -(\\V*\\ OOt B +C + M + osc ip ), 

then any w-limit point of (p(t, x) as t — ¥ oo lies in Q c (see Definition I3.14[) . By 
Theorem 13.151 if ipa G Q c , then tp(t,x) ~ > V*(x) + g as t — > oo. Since the w-limit set 
of ifo is invariant and the only invariant set in Q c is the singleton {V* — V*(0) + g} 
the result follows. □ 

6. Concluding Remarks. We have studied the relative value iteration algo- 
rithm for an important class of ergodic control problems wherein instability is pos- 
sible, but is heavily penalized by the near-monotone structure of the running cost. 
The near-monotone cost structure plays a crucial role in the analysis and the proof 
of stabilization of the quasilinear parabolic Cauchy initial value problem that models 
the algorithm. 

We would like to conjecture that the RVI converges starting from any initial 
condition tpo G Oy* . It is only the estimate in Lemma 14.21 that restricts us to consider 
bounded initial conditions only. We want to mention here that a related such estimate 
can be obtained as follows: 



W(t, x) = inf 

C/GU 



t 

T(X s ,U s )ds + tp Q (X t ) 



ueii 



o 



r(X s , U s ) ds + ip(t - t, X t -r) 
> -qt+ min w(t-T,y) Vre[0,t], Vx G R d . 

y£R d 

In particular 

min Tp(t — t, ■ ) — min 7p(t, ■ ) < qt Vr G [0, t] , 

and this estimate does not depend on the initial data ipo. This suggests that it 
is probably worthwhile studying the variation of the RVI algorithm that results by 
replacing tp(t,0) by min K d (p(t, •) in (|1.7p . 

Rate of convergence results and computational aspects of the algorithm are open 
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